Construction of a high-density genetic map and dissection of genetic architecture of six agronomic traits in tobacco (Nicotiana tabacum L.)

Tobacco (Nicotiana tabacum L.) is an economic crop and a model organism for studies on plant biology and genetics. A population of 271 recombinant inbred lines (RIL) derived from K326 and Y3, two elite flue-cured tobacco parents, has been constructed to investigate the genetic basis of agronomic traits in tobacco. Six agronomic traits including natural plant height (nPH), natural leaf number (nLN), stem girth (SG), inter-node length (IL), length of the largest leaf (LL) and width of the largest leaf (LW) were measured in seven environments, spanning the period between 2018 and 2021. We firstly developed an integrated SNP-indel-SSR linkage map with 43,301 SNPs, 2,086 indels and 937 SSRs, which contained 7,107 bin markers mapped on 24 LGs and covered 3334.88 cM with an average genetic distance of 0.469cM. Based on this high-density genetic map, a total of 70 novel QTLs were detected for six agronomic traits by a full QTL model using the software QTLNetwork, of which 32 QTLs showed significant additive effects, 18 QTLs showed significant additive-by-environment interaction effects, 17 pairs showed significant additive-by-additive epistatic effects and 13 pairs showed significant epistasis-by-environment interaction effects. In addition to additive effect as a major contributor to genetic variation, both epistasis effects and genotype-by-environment interaction effects played an important role in explaining phenotypic variation for each trait. In particular, qnLN6-1 was detected with considerably large main effect and high heritability ( ha2 =34.80%). Finally, four genes including Nt16g00284.1, Nt16g00767.1, Nt16g00853.1, Nt16g00877.1 were predicted as pleiotropic candidate genes for five traits.


Introduction
Tobacco (Nicotiana tabacum L.) is an important economic crop widely grown throughout the temperate agricultural zones in the world. According to FAO-statistics, approximately 5.9 million tons of tobacco were produced worldwide in 2020, and China accounted for 36.3% of the total yields (Food and Agriculture Organization of the United Nations, 2020). Tobacco products are mainly derived from leaves, which are post-processed and consumed as cigarettes, cigars, pipes, and chewing tobacco, and cultivating high-yield and high-quality tobacco leaves will increase the economic value of the tobacco industry (Wu et al., 2020). Therefore, genetic improvements of the agronomic traits related with leaves are of great importance for tobacco industry. In addition to being an essential crop in agriculture, tobacco is also a typical model plant for genetic study, pathology investigation and biotechnology (Liu et al., 2013;Sun et al., 2015;Wang and Bennetzen, 2015;Wang and Balint-Kurti, 2016;Xiu et al., 2016;Bao et al., 2017;Pagliano et al., 2017;Lee and Kim, 2018). However, the genetic mechanisms underlying important agronomic traits in tobacco are not well-characterized owing to the complexity of genome and limitation of molecular genetic resources (Julio et al., 2006;Lewis et al., 2007).
As a member of the family Solanaceae, tobacco is an allotetraploid originated from two ancestral parents Nicotiana sylvestris (2n = 24) and Nicotiana tomentosiformis (2n = 24) about 200,000 years ago (Leitch et al., 2008;Sierro et al., 2013;Sierro et al., 2014). The genome of tobacco is massive (~4.5 Gb) and complex, with numerous repetitive sequences (Zimmerman and Goldberg, 1977). Advances in sequencing technologies and genome assembly methods have allowed for the generation of a nearly complete reference genome for allotetraploid tobacco (Sierro et al., 2013;Sierro et al., 2014;Edwards et al., 2017;Thimmegowda et al., 2018). In addition, efforts to develop more molecular genetic markers, from microsatellites or simple sequence repeats (SSRs) to single nucleotide polymorphisms (SNPs), have made major achievements, which enabled the construction of high-density linkage maps (Bindler et al., 2011;Tong et al., 2012;Xiao et al., 2015;Gong et al., 2016;Tong et al., 2016;Thimmegowda et al., 2018;Tong et al., 2020). For example, Xiao et al. (2015) developed 4138 SNP markers using next-generation RAD sequencing and constructed a genetic map covering a total length of 1944.74 cM with a population of 193 backcross individuals. Recently, a genetic map with 45,081 SNPs has been constructed by whole genome sequencing using a tobacco population including 274 accessions (Tong et al., 2020). However, construction of a higher density genetic map is still necessary and valuable for more in-depth genetic dissection of complex traits. Therefore, based on the same population, we discovered more polymorphic markers, including indels (insertions or deletions) and SSRs, to the published SNPs (Tong et al., 2020), and constructed an integrated linkage map. This higher-density linkage map will provide a more solid foundation for fine mapping of quantitative trait locus (QTL), further functional analysis by bioinformatics tools and dissection of complicate genetic architecture in tobacco.
Because of high-density and high-quality genetic maps unavailable, QTL mapping studies in tobacco have lagged behind many other crops, including maize (Liu et al., 2014;Zhang et al., 2017), rice (Dixit et al., 2017;Hu et al., 2018), and wheat (Prat et al., 2017;Tura et al., 2020). Up to now, the conducted QTL mapping studies in tobacco were mainly related to the traits diseases resistance (e.g. black shank, brown spot and soilborne disease) (Tong et al., 2012;Drake-Stowe et al., 2017;Zhang et al., 2018;Cheng et al., 2019;Ma et al., 2019), agronomic performance (e.g. various morphological traits) (Lewis et al., 2007;Wu et al., 2014;Cheng et al., 2015;Tong et al., 2021;Liu et al., 2022) and leaf quality (e.g. chemical components and smoke properties) (Julio et al., 2006;Vijay et al., 2010). Particularly, for important agronomic traits associated with yield, few QTLs of tobacco have been reported so far (Lewis et al., 2007;Cheng et al., 2015;Tong et al., 2021;Liu et al., 2022). Moreover, those studies of QTL mapping mainly focused on the QTL effects, epistatic effects were barely investigated as well as their interactions with environments. However, epistasis has long been recognized to be fundamentally important for revealing genetic architecture of quantitative traits, identification and estimation of interaction effects of epistatic loci will benefit to improve predictions of heterosis in crop species (Mackay, 2014). In further, it has been known that the magnitude of gene effects is subjected to the difference of environments, that is depicted as interaction of gene by environment, which plays an important role in determining the environmental adaptability and genetic stability of varieties ; Therefore, exploring the environment-specific effects of QTL is indispensable in QTL study on complex traits.
In this study, we developed a new integrated SNP-indel-SSR linkage map for a population of tobacco RILs derived from K326 and Y3 with 274 individuals. Based on the high-density linkage map and multi-environment phenotypic data of the RIL population, QTL mapping was conducted for six agronomic traits; the detected maineffect QTLs, epistasis QTLs and their interactions with environments provided more insights into the genetic architecture of the traits; these results will greatly facilitate the molecular improvements of breeding target traits in tobacco.

Plant materials and field trial
The recombinant inbred lines (RILs) were generated from two elite flue-cured tobacco parents Y3 and K326. Y3 is a backbone cultivated variety that originated from Zimbabwe with elite agronomic traits and complicated parental sources. K326, whose genome has been assembled (Edwards et al., 2017), was introduced from America with high commercial quality and disease resistance but moderate agronomic performance. A total of 274 genotypes were employed in this study, consisting of two parents, one F 1 generation individual (YKF 1 ; Y3 × K3K6) and 271 F 7 generation individuals. The materials were planted at Yanhe (N: 24.35; E: 102.54) and Shilin (N: 23.46; E: 103.17) field experiment stations using complete random design with 5 replications, and were cultivated according to local technical measures for quality tobacco production. Six agronomic traits, including natural plant height (nPH), natural leaf number (nLN), stem girth (SG), inter-node length (IL), length of the largest leaf (LL) and width of the largest leaf (LW), were measured in 2018, 2019, 2020 and 2021 years in Yanhe, and in 2018, 2019 and 2020 in Shilin. Seven different combinations of location and year were treated as environments denoted as E1 (2018 Shilin), E2 (2018 Yanhe), E3 (2019 Shilin), E4 (2019 Yanhe), E5 (2020 Shilin), E6 (2020 Yanhe) and E7 (2021 Shilin). Six agronomic traits were investigated on the day 65 after planting in the field (first green fruit stage) according to the tobacco industry standard YC/T 369-2-10.

Statistical analysis of phenotypes
Variance components analysis and heritability estimation were performed based on the following mixed linear model, where y khi is the phenotypic value of the i-th replication of the k-th individual in the h-th environment; m is the population mean; g k is the genotypic value of the k-th genotype, random effect, subjected to a normal distribution g k e N(0, s 2 g ); ge kh the effect of the h-th environment, random, subjected to the normal distribution e h e N(0, s 2 e ); ge kh the interaction effect between the k-th genotype and the h-th environment, subjected to the normal distribution ge kh e N(0, s 2 ge ), random effect; e khi is the residual effect of the individual, random, ϵ khi e N(0, s 2 ϵ ). The mmer module of sommer R package (Covarrubias-Pazaran, 2016) was applied to estimate the variances of random effects (ŝ 2 g ,ŝ 2 e , ŝ 2 ge ,ŝ 2 ϵ ) and to predict the random effects by BLUPs (best linear unbiased predictions,ĝ k ,ê h ,ĝe kh ) by solving the mixed model equation (MME). Broad sense heritability was estimated with the formulaĤ 2 =ŝ 2 g =(ŝ 2 g +ŝ 2 ge +ŝ 2 ϵ ), thereŝ 2 g was the estimated genotypic variance, ŝ 2 e was the estimated environmental variance,ŝ 2 ge was the estimated variance due to genotype-by-environment interaction, and ŝ 2 ϵ was the estimated residual variance. The rcorr module of Hmisc R package (https://cran.r-project.org/web/packages/ Hmisc/index.html) was employed to calculate Pearson correlation coefficients between six studied traits: (1) phenotypic correlation coefficients with y khi for each environment, respectively; (2) genetic correlation coefficients withŷ k . (ŷ k =m +ĝ k ), whereŷ k is the corrected genotypic value of the k-th individual by population mean,m is the estimated population mean, andĝ k is the genotypic value of the k-th individual predicted by BLUP.

DNA extraction and sequencing
The genomic DNA of 274 samples (271 RIL F 7 individuals, the parents, and one F 1 generation) was extracted from the leaves of 7week-old tobacco seedlings by Qiagen DNeasy Plant Mini Kits (Qiagen, Hilden, Germany) and sonicated into~350 bp fragments via a Covaris M220 system (Covaris, Woburn, MA. USA). The paired end DNA fragments were used to generate DNA Nano Balls via LM-PCR and rolling circle amplification. The sequencing library was constructed according to the requirements of the BIGSEQ-500 platform and sequenced with PE100 (Li et al., 2018).

SNP/Indel calling and filtering, SSR marker resources
SNPs and indels were called using GATK (GATK3.3.0) and filtered according to the following criteria (McKenna et al., 2010): For SNPs, (1) SNPs of the parents and F 1 generation were filtered based on DP ≥10 and GQ ≥25.
(2) SNPs of the 271 RIL F 7 individuals were filtered based on DP ≥3. (3) SNPs that were absent in the parents and F 1 generation were removed. (4) SNPs that exhibited a missing rate of over 20% in the 271 RIL F 7 individuals were eliminated. (5) Triallelic or tetraallelic sites were deleted. For indels, (1) Indels of the parents and F 1 generation were filtered based on DP ≥10 and GQ ≥40.
(3) Indels that were absent in the parents and F 1 generation, were removed. (4) Indels that exhibited a missing rate of over 20% in the 271 RIL F 7 individuals were eliminated. (5) Triallelic or tetraallelic sites were deleted. (6) Indels over 5bp were removed. After quality controls, there were 10,057,282 high-quality SNPs and 569,946 indels remained for subsequent analysis. Then, we screened out 1,626,811 SNPs and 71,130 indels which exhibited two different homozygous genotypes in two parents (aa and bb) and heterozygous (ab) in F 1 (hereinafter referred to as aa × bb & F 1 -h types); consequently, high quality linkage map will be guaranteed after eliminating the influence of uncertainty in genotype because of the high heterozygosity of Y3.
A total of 937 SSRs were provided by Tong et al. (2016) and the primer pairs of SSRs were presented in the Supplementary Table 1.

Genetic linkage map
The number of the markers (1,626,811 SNPs and 71,130 indels) that were aa × bb & F 1 -h types was over whelming for most software used to construct genetic linkage maps. Therefore, we clustered the markers that mapped on the same scaffold, kept one marker per 100 Kb as the low recombination rate within a short physical distance, and selected out 44,804 SNPs and 2,163 indels. Together with 937 published SSRs, a total of 47,904 markers were selected uniformly for genetic linkage map construction in software LepMap3 (Rastas, 2017). In detail, 47,814 markers were identified as informative via the ParentCall2 module, and anchored to 24 LGs using the SeparateChromsomes2 module (LOD = 83). The genetic distance of each LG was calculated by the OrderMarkers2 module. The genetic linkage density map was visualized using R package in LinkageMapView (Ouellette et al., 2018).

Genetic model and statistical methods for QTL mapping
A saturated genetic model was adopted for modeling the genetic architecture of complex traits from multi-environment trials, which includes additive effect (a) of each QTL, additive-by-additive epistatic effect (aa) of each paired QTLs, treated as fixed effects, and their corresponding environment interaction effects (ae and aae) as random effects. Suppose a trait is controlled by s segregating QTL in which t pair of QTL involved interaction. Then, the phenotypic value of the m-th replication of the k-th genotypes in the h-th environment (y hkm ) can be expressed by the following mixed linear model, where, m is the population mean; a i is the additive effect of the i-th QTL with coefficient x ik , fixed effect; aa ij is the additive-by-additive epistatic effect of the i-th QTL and the j-th QTL with coefficient x ik x jk , fixed effect; e h is the main effect of the h-th environment, random effect, e h e (0, s 2 E ); ae hi is the additive-by-environment interaction effect of the i-th QTL and the h-th environment with coefficient u hik (=x ik )random effect, ae hi e (0, s 2 A i E ); aae hij the interaction effect of the aa ij and the h-th environment with coefficient u hijk (=x ik x ik ), random effect, aae hij e (0, s 2 AA ij E ); and e hkm is the residual effect of the individual, random, ϵ hkm e (0, s 2 ϵ ). QTLNetwork 2.0 software were employed to detect possible QTLs by the mixed-linear-model-based composite interval mapping (MCIM) method (Yang et al., 2008). One-and two-dimensional genome scans for QTLs were performed using a 5 cM testing window, a 0.5 cM walk speed and a 5 cM filtration window size. To control the experiment-wise type I error rate, a critical F-value based on the Henderson III method was determined by the permutation test with 1,000 times for each tested locus at the significance level of 0.05. Based on the significant QTL, A QTL full model was established and used to estimate each parameter based on the samples generated by Markov chain Monte Carlo (MCMC) with 20,000 Gibbs sampler iterations.

Phenotypic performance of six agronomic traits
For the six agronomic traits, the estimated heritability fluctuated from 12.57% (LL) to 57.72% (nLN), indicating that traits like nLN and nPH were more stable across environments, compared with the traits like LL and LW (Table 1). Most phenotypic correlations of the traits exhibited positive direction and reached statistically significant (a=0.05) ( Figures 1A-G), while the correlation coefficient between LN and IL showed negative in E1 ( Figure 1A), so as LN and LL in E7 ( Figure 1G). The overall pattern fluctuated slightly and remained consistent in seven environments. The genetic correlation coefficients, which were calculated using the estimated genotypic values, exhibited the highest 0.79 between nLN and SG, followed by 0.63 between IL and LW, and a series of correlations between nPH and the other traits (apart from LL) over 0.5 ( Figure 1H). Especially, we found that the phenotypic correlation between nLN and SG were shown relatively high in all seven environments, the high underlying genetic correlation might explain for it.

Genetic linkage map
We finally obtained 46,324 markers to generate a high-resolution genetic linkage map that contained 7,107 bin markers (defined as the marker with least genotype missing rate when there were multiple markers at the same genetic distance) mapped on 24 LGs and covered 3334.88 cM with an average genetic distance of 0.469cM (Figure 2 and Table 2). The length of the LGs ranged from 74.94 cM to 198.21 cM, of which the shortest was LG24 containing 122 bin markers (0.614 cM/locus) and the longest was LG16 holding 406 bin markers (0.488 cM/locus). The number of bin markers on each LG varied from 122 to 436. Thus far, this genetic linkage map harboring 46,324 markers (7,107 bin markers) is the highest-resolution map reported in tobacco. Considering of the size of RIL population, the distribution density of the markers and the limitation of the software for QTL mapping, we then removed markers with adjacent distance less than 1cM, and retained 2,139 bin markers containing 1,918 SNPs, 76 indels and 145 SSR markers harboring on the 24 LGs for the subsequent QTL analysis (Supplementary Table 2).

Additive and additive-by-environment interaction effects
According to the critical F-values to declare QTLs with significant single locus effects (called single-locus QTL hereafter) or to declare epistatic QTLs with significant epistatic effects (called epistatic QTL hereafter), a total of 70 QTLs consisting of 33 single-locus QTLs (Table 3) and 37 epistatic QTLs (Table 4) were mapped on 18 LGs, of which, 5 single-locus QTLs and 4 epistatic QTLs were detected for IL, 7 and 4 for LL, 6 and 5 for LW, 4 and 11 for nLN, 7 and 6 for nPH, 4 and 7 for SG, respectively (Supplementary Table 3). LG6 contained the most QTLs (19 QTLs), followed by LG4 (8 QTLs) and LG16 (7 QTLs), while LG11, LG13, LG19 contained the fewest QTLs (only 1 QTL on each).
Totally, 33 QTLs with additive (a) main effects and/or additiveby-environment interaction (ae) effects were detected for six traits. Most of the QTLs exhibited small additive effects, which were regarded as minor-effect QTLs, and about 70% of them explained less than 2% of the phenotypic variance (Table 3). The average proportion of phenotypic variance explained by a QTL (h 2 a ) was 3.23%. For LL, there were seven QTLs contributing small additive effects to phenotypic variation, indicating that the trait LL was under control of the minor-effect polygenes.
Nevertheless, there were seven QTLs with relatively large effects found in five traits, namely, qnLN6-1 (h 2 a = 34.80%), qnPH6-5 (h 2 a = 12.68%), qSG6-1 (h 2 a =10.34%), qLW6-2 ( h 2 a = 6.05%), qnLN6-3( h 2 a = 5.04%), qIL6-3 ( h 2 a = 4.80%) and qnPH6-2 ( h 2 a = 4.21%; Table 3), which played a major role in determining the variation of the corresponding traits. Intrinsically, we speculated that traits like nLN, nPH, SG, LW and IL were controlled by one or two major QTL(s)/gene(s). It was interesting that all seven major QTLs were distributed in two regions on LG6, wherein qnLN6-1, qSG6-1 and qnPH6-2 being mapped together on the position of 14.3 cM, whereas qLW6-2 and qnPH6-5 on the position of 166.3 cM; and the homozygous genotypes of the alleles from the parent K326 (QQ) at these seven QTLs all contributed negative effects (i.e., decreasing the trait value); in contrast, those of the alleles from the parent Y3 (qq) contributed positive effects (Supplementary Table 4). It was in accordance with the known fact that Y3 was an elite variety in agronomic performance.
Besides, some regions of the linkage groups harbored more than one QTL for different traits (Supplementary Table 5 Phenotypic and genetic correlations between traits in the RIL population. Heat map (A-G) showed phenotypic correlation coefficients between six traits in E1 (2018 Shilin) -E7 (2021 Shilin) in turn, the heat map (H) showed genetic correlation coefficients between six traits. *,**,*** denote significance level at 0.05, 0.01 and 0.005, respectively. Traits abbreviations are same as those in Table 1. studied traits (Figure 1). For example, qnLN6-1and qSG6-1were located exactly on the same genetic position of the linkage map, so as qnLN6-3and qSG6-2, which verified the existence of stronger genetic correlation up to 0.79 between nLN and SG. Although significant additive by environment interaction effects (ae) were detected for more than half of QTLs, their contributions to phenotypic variation (h 2 ae ) were lower (less than 1%). However, a few ae effects were relatively large, for instance, the interaction between qnPH6-2 and five (E1-E5) environments accounted for 4.40% of the phenotypic variance (Table 3). It was known to all that QTLs with no significant genotype-by-environment interaction effects will be important for breeding environmentally stable varieties. However, QTLs with relatively large effects interacted with environments more or less. Moreover, most of the ae effects exhibited the same (positive or negative) effect direction as those of the main effects of the QTLs, while some of them showed the opposite. For example, qLL6-1 decreased the length of the largest leaf, whereas its interaction effects with environment E4 and E6 increased the value of the trait.

Additive-by-additive epistasis and epistasis-by-environment interaction effects
Two-dimensional genome scan detected 21 digenic epistatic QTL pairs with additive-by-additive epistasis (aa) effects and/or epistasisby-environment interaction (aae) effects for six traits, in which 8 pairs had aa effects only, 4 pairs had aae effects only and 9 pairs had both aa and aae effects. However, 21 epistatic QTLs which involved 37 loci contributed insignificant additive-additive epistasis effects (Table 4). What's more, the aae may performed oppositely in different environments. For example, the environment-specific epistatic effect (aae) between qnLN2 and qnLN4-2 in E3/E6 was in the same direction with that of the aa effect, while its interaction with E5 effected in opposite. Both aa and aae effects contributed limited magnitude to phenotypic variance, mostly less than 1% (Table 4).
Overall, for the traits except nLN, environmental (E) effects explained the biggest part of the phenotypic variance which ranged from 31.65% to 47.82%, followed by genetic (G) main effects and Linkage maps based on the reference genome. The linkage maps were constructed with a total of 46,324 markers. A total length of 3334.88 cM was mapped with 7,107 bin markers, and average distance was 0.469 cM.
genotype-by-environment interaction (GE) effects. While for nLN, the genetic effect was the major component of the phenotypic variance, wherein the additive effect of qnLN6-1 contributed a large part to the total variation (Supplementary Table 6).

Breeding potential of predicted lines
In terms of general genetic main effects, P2 (Y3) performed much better than P1 (K326) in most agronomic traits ignoring gene by environment interaction (Supplementary Table 7

Prediction of candidate gene in major QTLs
On the one hand, the genetic correlations between some traits were especially high with coefficients over 0.5 (Figure 1), indicating that there might exist pleiotropic genes accounting for it. On the other hand, six major QTLs of these traits happened to be mapped in two regions, where pleiotropic genes might harbor and it was more possible to find potential candidate genes from these regions.
The qnLN6-1 (h 2 a = 34:80 % ), qSG6-1 (h 2 a = 10:34 % )and qnPH6-2 (h 2 a = 4:21 % )were mapped on the same genetic position (Supplementary Table 5), closely correlated with the markers PT50965 and SNP_0386058_322; they were determined to be located on the region from 8,165,328 bp to 8,979,225 bp on Nt16 based on the K326 reference genome reported by Edwards et al. (2017). Comparative mapping of the K326 reference genome predicted the existence of a total of 19 genes from 81.65 Mb to 89.79 Mb on Nt16 (Supplementary Table 8). Among them, one gene  ae , the proportion of phenotypic variance explained by the additive-by-environment interaction. The critical F-values to declare QTL with significant single locus effects are 4.1 for nPH, nLN, SG, LL, LW, and 4.0 for IL, respectively. *, **, *** denote significance level at 0.05, 0.01 and 0.005, respectively. Abbreviations of traits are same as those in Table 1. Nt16g00284.1 encoding a protein with high homology to FRAGILE FIBER3 (FRA3) in Arabidopsis thaliana (Zhong et al., 2004) was considered as the candidate pleiotropic gene for trait nLN, SG and nPH. FRA3 encodes a type II 5PTase, plays an essential role in the secondary wall synthesis in fiber cells and xylem vessels and is highly expressed in fiber cells and vascular tissues in stems.
Another group of major-effect QTLs, qnPH6-5 (h 2 a = 12:68 % ), qLW6-2 (h 2 a = 6:05 % )and qIL6-3 (h 2 a = 4:80 % ), were found to be at the region between markers TM23680 and SNP_0000403_34752 simultaneously. A total of 173 genes were annotated on this target region occupied from 24,110,108 bp to 39,896,452 bp on Nt16 (Supplementary Table 8). We finally screened out three candidate genes: Nt16g00767.1 gene, which encodes a protein highly homologous to BRASSINOSTEROID INSENSITIVE 1 (BRI1) in Arabidopsis thaliana (Domagalska et al., 2007); Nt16g00853.1 gene, which encodes a protein with high homology to BRASSINOSTEROID-RESPONSIVE RING-H2 (BRH1) in Arabidopsis thaliana ; Nt16g00877.1 gene, which encodes a protein highly homologous to SOSEKI2 (SOK2) in Arabidopsis thaliana (Finnegan et al., 2004). BRI1 appears to be involved in the autonomous pathway that regulates the transition to flowering, primarily through its effects on FLOWERING LOCUS C (FLC) expression levels; BRH1 mediates BRs signaling pathway, which also influence FLC expression; SOK2 is one part of a three-gene cluster containing FLC, UFC and DFC, all of three genes are associated with FLC. Encoding a MADS-box transcription factor, FLC is a major repressor of flowering in Arabidopsis (Deng et al., 2011), therefore, we speculated that the extension of vegetative stage might prolong the period of transition and provide more time for stems and leaves to thrive.

Discussion
We have known that the genetic mechanisms underlying important agronomic traits in tobacco are not well-characterized owing to the genome complexity and limited molecular resources. As SNP is one of the most abundant forms of genetic variation (Ganal et al., 2009), a high-resolution genetic map, containing 45,081 SNPs aae , the proportion of phenotypic variance explained by the epistasis-by-environment interaction. The critical F-value to declare epistatic QTLs with significant epistatic effects is 3.5 for nPH, 3.6 for nLN, 3.2 for SG, 3.1 for IL, 3.2 for LL and 3.3 for LW. *, **, *** denote significance level at 0.05, 0.01 and 0.005, respectively. Abbreviations of traits are the same as Table 1. markers covered 3486.78 cM with an average genetic distance of 0.495 cM, has been constructed by whole genome sequencing in tobacco (Tong et al., 2020). In this study, we further took informative genetic markers like indels and SSRs into account to explore the genetic architecture of agronomic traits in tobacco, which has been proved to be effective by numerous studies in various plants. While SSR markers have been contributing to the studies of tobacco over the past few decades (Bindler et al., 2007;Moon et al., 2009;Bindler et al., 2011;Tong et al., 2012;Yang et al., 2020), indels have not been investigated until recent years. In order to achieve more reliable results under a more thorough design, we finally merged 43,301 SNPs, 2086 indels and 937 SSRs together to construct an integrated higher-density linkage map covering 3334.88 cM with an average genetic distance of 0.469cM. Moreover, the QTL analysis based on this new linkage map indicated the indispensability of SSR markers (e.g. PT50965 and PT30412) in determining the position of QTLs and exploring genetic architectures of agronomic traits (Supplementary Table 3).
Considering of involvement of gene-gene, gene-environment interaction in the genetic variation of complex traits, we used a full-QTL model to integrate the effects of additive QTLs, epistasis and GE interactions into one mapping system provided by QTLNetwork2.0, which has not been taken into consideration in most QTL studies of agronomic traits in tobacco. The QTL main effects are treated as fixed, which is valid across different environment; while, the QTLenvironment interaction effects as random, which are deviation from the average effect of QTL effects across all environments, zero summation of which is assumed. Therefore, some positive or negative QTL-environment interaction effects (ae, aae) would be observed in results. For six traits, a total of 17 epistasis QTL pairs were detected. These results proved that non-additive epistatic interactions are an important contributor to trait variation, which has been documented by numerous studies before (Kim and Rieseberg, 2001;Xing et al., 2002;Kroymann and Mitchell-Olds, 2005;Malmberg et al., 2005). What's more, we also identified some environment-specific QTL effects (ae and aae) under multiple environments, which could be utilized in genetic improvement of tobacco to breed environmentspecific or strong environment adaptability elite varieties in future breeding program. Compared with those models with QTL additive effects only, this model indeed offered us new insights into understanding of genetic architecture of quantitative traits. However, this study found that the additive effects were the main force accounting for heritability and provided guidance for further studies and utilization of the QTLs, such as, fine mapping, cloning and marker-assisted selection. Such as qnPH6-5, qnLN6-1, qnLN6-3, qSG6-1, qIL6-3 and qLW6-2, they contributed relatively larger additive effects compared with additive-environment interaction effects, indicating selection on them is valid across environments; while, qnPH6-2 exhibited similar magnitude of a and ae effects, indicating it could be selectively utilized in different environment. As we conjectured that nLN, nPH, SG, IL, LW were controlled by several major effect QTLs/genes, especially for nLN, whose main effects were considerable compared with that of environment interaction; therefore, these major QTLs are promising and should be paid more attention in following-up utilization in breeding higheryield varieties.
Based on the results of QTL mapping, mining candidate genes controlling the traits could be confined to the regions of QTLs. However, based only on preliminary mapping population (RILs) like our study, it is still a challenge to distinguish candidate genes of target trait from a lot of annotated genes in QTL regions. Therefore, further studies such as fine mapping using secondary mapping populations is required to narrow down the target QTL. Including single segment substitution lines (SSSLs), near isogenic lines (NILs) and chromosome segment substitution lines (CSSLs), secondary mapping populations are identical to the recurrent parent for all trait loci except the target QTLs, which can eliminate the interference of genetic background and improve the sensitivity and accuracy of miming and prediction of candidate genes in QTL. What's more, functional validation of candidate genes with molecular biology techniques on the levels of gene expressions, proteins and metabolites should be implemented to understand the genetic regulation mechanism of traits, which is valuable to guide us for designing superior genotype with better performance of agronomic traits.

Data availability statement
The data presented in the study are deposited in the European Nucleotide Archive (ENA) repository, accession number PRJEB59363 (https://www.ebi.ac.uk/ena/browser/view/PRJEB59363). study: performed field trials and provided data including genotypic and phenotypic data.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.